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Abstract —A packing lemma is proved using a setting where 
the channel is a binary-input discrete memoryless channel 
[X, wlylx),^), the code is selected at random subject to 
parity-check constraints, and the decoder is a joint typicality 
decoder. The ensemble is characterized by (i) a pair of fixed 
parameters ( H,q ) where H is a parity-check matrix and q is 
a channel input distribution and (ii) a random parameter S 
representing the desired parity values. For a code of length n, 
the constraint is sampled from psis) = Y.x n £% n (t>{s,x n )q n {x n ) 
where <p[s,x n ) is the indicator function of event \s = x n H T ] 
and q"(x n ) = f ['■ | qixj). Given S= s, the codewords are chosen 
conditionally independently from px n \s(x n \ s ) <x <p[s,x n )q n (x n ). 
It is shown that the probability of error for this ensemble 
decreases exponentially In n provided the rate R is kept bounded 
away from /(X; Y) - L/(S; Y n ) with (X,Y) ~ q(x) w(y\x) and 
( S,Y n ) ~ ps(s)Y.x n n”_j u>{yi\X{). In the special case 

where H is the parity-check matrix of a standard polar code, it 
is shown that the rate penalty L I [S', Y n ) vanishes as n increases. 
The paper also discusses the relation between ordinary polar 
codes and random codes based on polar parity-check matrices. 

I. Introduction 

Packing and covering lemmas are basic building blocks 
of coding theorems in information theory. The book by El 
Gamal and Kim (Tj exemplifies this; it relies on a small 
number of packing and covering lemmas (such as Lemma 
3.1 [I] p. 46] and Lemma 3.3 [U p. 64]) to prove a vast 
number of coding theorems for multi-terminal source and 
channel coding problems. Unfortunately, the packing and 
covering lemmas used for proving theorems in a clean way 
rely on joint, or at least pairwise, independence among 
the codewords. Joint or pairwise independence are too 
strong assumptions for various practical code ensembles, 
including those for polar codes. The goal of this paper is to 
prove a packing lemma under less stringent conditions on 
the code ensemble. The motivation behind this work is to 
develop packing and covering lemmas that are applicable 
to polar codes so that existing proofs based on standard 
code ensembles can be translated readily to similar proofs 
for polar codes. In this paper, we address only the packing 
problem. The results are preliminary. More work is needed 
to establish the desired links between random-coding meth¬ 
ods and explicit polar code constructions. 

In Sect. El we review the random-coding method in 
the absence of any constraints. In Sect. Ell we extend the 
method of Sect. El to the case of random-coding subject to 
parity-check constraints. In Sect. |lv] we further specialize 
the results to the case of parity-check matrices obtained 


from polar coding. The paper concludes in Sect. 0 with a 
summary and remarks. 

II. Standard random-coding method 

This section reviews the standard random-coding 
method. We follow the presentation given in Qj, Sect. 3.1.2] 
and, for the most part, adopt the notation and conventions 
there. 

Consider a communication system employing block 
coding over a discrete memoryless channel (DMC) 
[9C, w{y\x), { >¥) with input alphabet 3C, output alphabet , 
and transition probabilities w[y\x ), x e X, y e <3/. Let R 
denote the code rate, n the length of the codewords, and 
c — {x”(l),...,x"(2 rnir| )} the code itself. To send message 
m, one transmits the codeword x n (m) into the channel; in 
response, the channel outputs a word y n with probability 

w n {y n \x n {m )) = Y\ w(yi\xrtm)); (1) 

i =1 

and, the decoder in the system maps y n to a decision 
m e [1 : 2^ nR ^] u {e} where e is a special symbol indicat¬ 
ing decoder failure. Elere, the decoder is assumed to be 
a joint typicality decoder designed for a channel input- 
output ensemble (X, 7) ~ q{x)w{y\x) where q(x) is a given 
probability distribution on 9C. Given y n , the joint typicality 
decoder outputs m[y n ) = j if j is the unique message index 
in [l:2 r " fll ] such that {x n {j),y n ) e 5~} n \x,Y)-, otherwise, 
the output is m = e. Here, ST} 11 ' 1 is defined as in 0Q p. 27], 
namely, as the set of all (x",y n ) e SC n x <Sf n such that the 
inequalities 

|7r(x,y|x",y n ) - q{x) w (y|x)| < e q{x)w(y\x) 

hold for each (x,y) e 3£x<3/, where 7t(x,y|x”,y") is the 
fraction of times (x,y) appears as a coordinate of (x'\y”). 

In random-coding analysis of such a system, one regards 
the code c as a sample of a random code c €, drawn with 
probability 

2 r«fli 

P^ic)= l\ q n (x n {j)), ( 2 ) 

S =i 

where x n [j) denotes the /th codeword in c and q n (x n ) = 
n” =l q{Xi). The entire system is represented by an 
ensemble (M,^, Y n ,M) with a probability assignment 
c.y", m) of the form 

PMim) P^[c) PY n \M &( y n \rn, c ) PM\<tg, Y n (. m\c ,y"), (3) 


where pMltn) is uniform on [1 :2^ nR ^], PY n \M,^{y\ m >c) is 
given by 0 with x n (m) as the mth codeword of c, and M 
is a function of (Y, Y n ) as determined by the operation of 
the joint typicality decoder. 

Let 8 = {M / M] denote the error event and P(8) the 
probability of error w.r.t. the above ensemble. The goal 
of the random coding analysis is to show that, for any 
fixed R < I{X;Y ) with {X,Y) ~ q[x)w{y\x), the probability 
of error PIS) goes to zero as the block-length n increases. 
The analysis begins by observing that, due to symmetry, 
P{8) = P{8\M = 1). Then, one defines 8\ = {(X"(l ),Y n ) € 
ST™} and S 2 = KX n (j), Y n ) e .T e (n) for some j ± 1}, so that 
one can write P{8\M — 1) = P{8\\j8 2 \M = 1) < P[8\\M — 1) + 
P{8 2 \M = 1). By standard results in large-deviation analysis, 
it is observed that P(<Sj|M= 1) goes to 0 (exponentially) in 
n. For the second term, the union bound is used to write 

2 \ nR \ 

P{02\M=D< £ P(®;|M=1) (4) 

7=2 

where @)j = {{X n {j),Y n ) e ST^}', then, a joint typicality 
lemma is invoked to bound each term in the union bound 
as 

P{@j\M= 1) = 2“" /(Z;Y) , j/1, (5) 

which establishes that P(8 2 \M = l)=2" fw_,!A:i/)) . This com¬ 
pletes the proof that P{8) goes to zero (exponentially) in n 
provided R < I{X;Y). If one chooses q{x) as a distribution 
that maximizes I(X; Y), one obtains a proof of achievability 
of the channel capacity C = max^*) I{X ; Y). 

III. Random coding under constraints 


For example, the odd-even parity constraint is symmetric 
with = 1. We will restrict attention to symmetric con¬ 
straint functions. 

The random code ensembles that we will consider will be 
denoted as (S, Y) with S denoting a random constraint vari¬ 
able that takes values in SP and Y = {X"(l),...,X n (2 l " iJ1 )} 
denoting a code chosen at random subject to the constraint 
S. We take q[x), the target channel input distribution, as 
given. For any particular constraint s e 5P and code c = 
{x n (\),...,x n [2^ nR1[ )}, we specify the probability assignment 
on [S,Y) as 


where 


and 


2 \nR\ 

Ps,^{s,c) = psis) J] q s (x n [m)) 
m =1 


Psis) = a s £(/>(s,x' 1 )g"(x n ), 
x n 

. A cp{s,x n )q n (x n ) 

qAx ) =-, 

Y.x^^,x n )q n {x n ) 


sty, 


x n e3C n . 


(7) 

( 8 ) 

(9) 


Thus, the codewords { X n {m )} are selected in a conditionally 
i.i.d. manner from q s , given the constraint S = .s. Note that 
the marginal distribution of individual codewords is given 
by 


px^m){x n ) = Y J PS^)q s {x n ) = q n {x n ), x n e 3C n , (10) 

which is in agreement with the target channel-input distri¬ 
bution. Also note that the channel output follows a product- 
form distribution 

PF«(y n )= f n (y B )= f[t( yi ) UU 

i=t 


In this section, we consider the same channel coding 
problem as in Sect. [IT] with the difference that here the 
code ensemble Y is subject to certain constraints. The 
target application of the method developed in this section is 
polar coding; however, for broader applicability and a wider 
perspective, initial formulation is given in a fairly general 
manner. 

A. Code generation under constraints 

The constraints on code generation will be represented 
by a parameter s taking values over a space SP. We will 
consider codes of length n and let x” e 9C n denote a 
generic channel input word of length n. We will model 
the constraints by a function <p : £P x 9C n —*• {0,1} such that 
<p{s,x n ) = 1 iff x n satisfies the constraint ,s\ As a simple 
example, let SP = {o,e} and let (file, x") = 1 iff the parity 
of x n is even and <p(o,x n ) = 1 iff the parity of x n is odd. A 
more general parity-check constraint will be treated in the 
next section. 

We will say that a constraint functions <p is symmetric if 
there exists non-zero reals (a s -: s e SP) such that 

Y_ a s (p(s,x n ) = 1, for all x n e3C n . (6) 


with t[y) = Y.x Mz(yjjc). 

B. Analysis of probability of error 

We now analyze the average performance of the con¬ 
strained code ensemble defined by 0. As in Sect. [Ill we 
assume that the message random variable M is uniformly 
distributed over [1: 2^ nR ^} and that a joint typicality decoder 
is being used. The joint ensemble for the system will be 
(M, S,Y, Y n , M) with a probability assignment 

PMirti) ps,^{s,c) p Y n\M^ly n \tn,c) p^ jn {rh\c,y n ), (12) 

which is the same as 0, except here the code ensemble is 
defined by 0. A property of this ensemble, which will be 
important in the sequel, is the independence of (S, Y n ) and 
M. This can be verified by writing 

Ps,Y n \M{s,y n \m) = Y<Ps,x’qm),Y’'\M(s,x n ,y n \m) 

x n 

= J^ps(s)q s (x n )w n {y n \x n ), 

x n 

and observing that the final sum is independent of m. 

We now turn to the error analysis and define the error 
events 8, 8 \, S 2 as in Sect. m As before, by symmetry, 
we have PIS ) < P(8\\M — 1) + P{8 2 \M = 1). As in Sect. [IT] 



the first term P{S\\M - 1) goes to zero exponentially in 
n. To bound the second term P(S 2 \M — 1), we will use an 
argument involving the sets Q)j as defined in Sect. [TT] as 
well as the mutual information random variable 


i{s\y n ) = log 


Ps,Y n is,y n ) 

Ps(fiPY>‘(y n ) 


. Ps,Y”{s,y n ) 

log-, 

Psls)t"{y n ) 


(13) 


and the event 

sd = {i{S-,Y n )> ny}. (14) 


The y in the definition of sd is a real number that will 
be specified later. In terms of these, we have the following 
bound. 


P(& 2 \M= 1) = P(<g’ 2 rW|M= l) + P(g 2 r\sd c \M = 1) 

2 r nfll 

< P(sd\M = 1) + E P&jnd c \M= 1) 
1=2 

= P{sd) + (2 r ' li?1 - 1)P(S> 2 n sd c \M — 1), 


Then, _P e>n goes to zero if R < I[X;Y) - y*. If the sequence 
{(Sn^n)} has a convergence property such as 

limsup {P[\i{S n ; Y n ) - I(S n ; Y n )\ > rce)} = 0, 

n—*oo 

for any fixed e > 0, then we may take 

y* = limsup 1 —7(S n ; T”) 1. (16) 

n—>oo [ yi J 

In any case, it is apparent that the cost of placing con¬ 
straints on the code is a rate penalty given by y*. We 
summarize the above discussion as follows. 

Lemma 1. Let {(S n , r C n )i be a sequence of constrained code 
ensembles indexed by code length n, with each ensemble in 
the sequence defined by (7) and having a common rate R. 
Let P en denote the probability of error for the nth ensemble, 
under joint typicality decoding. Then, P e ,« goes to zero as n 
increases provided R < I(X; Y) - y* where y* is defined by 

CCD. 


where in the last line we replaced P(sd\M = 1) with P(sd) by 
noting that sd, being an event defined in terms of (S, Y n ), 
is independent of {M = 1}. We define SB as the set of 
all (s, x n ,y n ) £ Sfx3T n x <& n such that (x n ,y n ) e ST e w and 
i (s; y n ) £ ny, and continue as follows. 

P(@> 2 r\sd c \M = 1 ) = Y Ps,Y n is,y n )q s [x n ) 

(s,x n ,y n )e3g 

< Y 2” r ps (s) t n (y”) q s ix n ) 

< E 2 n ^ps{s)t n {y n jq s {x n ) 

(s,x n ,y n )£#’x&j n) 

( = } Y 2 nr t n {y n )q n (x n j 
U”,y")£^ tn) 

where (a) follows by the fact that, for any {s,x n ,y n ) e SB, 
Ps,Y n (s,y n ) < 2” r ps(.s)t n (y n ), (b) by extending the range of 
the sum from SB to the larger set 5P x ST} n \ (c) by carrying 
out the sum over s e ST, and (d) by the joint typicality 
lemma [U p. 43]. Collecting the results, we have the bound 

P{g 2 \M= 1) < P(sd) + 2 n{R ~ nX;YHr) . 

To keep the upperbound on P(<? 2 |M = 1) under control, 
we need a large enough y so that P(.S) is small, but also 
a rate R smaller than I{X;Y)-y. These two conflicting 
objectives put into evidence that there is a trade-off be¬ 
tween performance and structure. For a more quantitative 
asymptotic statement, consider a sequence of ensembles 
{{Sn^n)} with each ensemble in the sequence having the 
same code rate R. Let P e n denote the probability of error 
for the nth ensemble. Let 

y* = infiy: limsupP(f(S n ; Y n ) > ny] — oi. (15) 


C. Parity-check constraints 


In this part, we continue the above discussion for the 
important special case of parity-check constraints. For sim¬ 
plicity, we restrict the discussion to channels with binary in¬ 
put alphabets, SC = {0,1}. We will identify SC with the binary 
field F 2 and use vector space operations over F 2 to define 
the code constraints. The joint ensemble for the system will 
still be [M,S t c C t Y n t M) with a probability assignment fT2t . 
except here we will consider a constraint function <p defined 
in terms of a parity-check matrix H e F^*" with r rows and 
n columns. We leave r as an arbitrary parameter, 0 < r < n, 
through the following analysis and discuss its effect on the 
results following the analysis. We take the constraint set as 
SC — F£ and for any (.s', x n ) e SY x SC n define the constraint 
function as 


(p{s, x n ) 


J1, if s = x n H T , 
|0, otherwise. 


(17) 


Note that <p is symmetric with a s - 1 for every seSC. Also 
note that (p splits the set 3C n into cosets JC S - {x n e SC n : 
x n H T — s} indexed by s e SC. Each coset has |J?f s | = 2 n ~ r 
elements and = x" + J%o where x" e S7C s is a coset 
representative for JC S and JCo denotes the coset for s-0 r . 

Lemma 2. Let sd be as in QD with y = i/(S; Y n ) + e for 
some e > 0. Then, for the parity-check code ensemble, 

P{sd) < expf-ra-^-j, (18) 

where d is a constant determined by q{x) and w(y\x). 

Proof: Note that i(S; Y n ) = f{X n , Y n ) where f{x n ,y n ) = 
i{x n H t ; y n ). Writing i[S;Y n ) in this way as a function of 
( X n , T") is useful because the function / is Lipschitz: Let 
(x n ,y n ) e SC n x <S n and (x n ,y n ) e SC n x <& n be any two 
points such that (a) [Xi,yi) C ( Xi,yi ) for some i E [1 : n] 
but {Xj,yfi = ( Xj,yj ) for all j f i, 1 < j < n, and (b) 




q n (x n )w n \y n \x n ) > 0 and q(x n )w n {y n \x n ) > 0. We claim 
that 

\f(x n ,y n )-f(x n ,x n )\<d u (19) 

for some constant <7/ that depends only on the distributions 
q[x) and w(y\x). 

Assuming for a moment that the claim {T9) is true, the 
lemma follows from Azuma-Hoeffding inequality, specif¬ 
ically, from the form of this inequality as given in (2] 
Corol. 5.2], with d — i Lf = i df . Therefore, it suffices to prove 
only (T9) , or equivalently, 


2~ d i < 2 f(x n ,y n )-f{x n , 

x n ) < 2 d i 


To that end, we write 



^f{x n ,v n l-f[x n ,x n 1 _ [ PS,Y n (s,y") 


1 p Y 'dy n )\ 

V Ps.Y'ds , y n ) 

i ips(s)J 

1 p Y n(y n j) 


understanding of this issue, let us interpret 7(S; Y n ) as the 
average information leaked by the received word Y n about 
the constraint S in a one shot transmission scenario where a 
codeword X n satisfying the constraint <p(S, X n ) — 1 is sent. 
From this perspective, we may expect that the larger the 
number of parity checks and the more sparse they are 
(involving fewer codeword digits), the larger will be the 
leakage. As a trivial example, we have H - I n (the identity 
matrix) with 7(S; Y n ) = I(X n ; Y n ) = nI{X\ Y), corresponding 
to maximum information leakage. A non-trivial example 
in the same vein is Gallager’s proof §3.8] that 7(S; Y n ) 
is bounded away from zero when 77 is the parity-check 
matrix of a regular LDPC code of a given rate. At the other 
extreme, we have the well-known fact that random parity- 
check codes achieve capacity, which a fortiori implies that 
I[S;Y n ) is typically o[n). 


where we put for shorthand s= x n H 1 , s= x n H T . Using the 
coset structure of the constraints, we have 


Ps,r»(s,y n ) 


Y Ps{s)q s {x n )w n {y n \x n ) 
X n £ sc n 


Y (p{s,x n )q n {x n )w n {y n \x n ) 
x n z x n 


Y q n [x n )w n iy n \x n ) 


Y q n {x n + x n )w n {y n \x n + x 11 ). 

~x n e Jto 


Thus, we have 

Ps, Y nts,y n ) _ Txn^ o q n {x n + x n )w n (y n \x n + x n ) 
Ps,Y n (s,y n ) T.x n e JBo q n (x n + x n )w n {y n \x n + x n )' 

Now, term by term, we have the bound 

q" [x n + x n )w n (y' ? \x n + x n ) _ q(xj + Xj) wijjUq + xQ < 
q n (x n + x n )w n [y n \x n + x n ) q(Xi+ x i )w(y i lx i + Xi] ~ q ’"' 
where 

a max{q[x)w(y\x): (x,y) e supp(<7(x)u>(y|x))} 
q ' u min{q(x) w(y\x): (x,y) e supp(<7(x) (^(ylx))} ' 


where “supp” denotes the support of a distribution. So, 

i psjYyYI < a 

Pq ’ w ps,Y«il9 n ) Pq ’ w ' 

Using the same type of argument, we get 


ps(s) 


p Y n (y“) 


where /3q is defined as the ratio of max{q(x): x e supp(< 7 (x))} 
to min{q(x): x e supp(< 7 (x))} and p t as the ratio of max{f(y): 
y e supp(r(y))} to min{f(y) : y e supp(f(y))}. Combining 
these, we obtain the proof of fl9l with d[ - log 2 (i Vq.wPqPt ) ■ 
The lemma follows, with d - (log 2 [Pq.wPqPt)) 2 ■ ■ 

This shows that P{srf) goes to zero exponentially in n 
regardless of the size (number of rows r) and form of 77; 
it should be clear, however, that the specific form of 77 
affects the rate penalty j^I(S;Y n ). To gain a more intuitive 


IV. POLAR PARITY-CHECK MATRICES 

In this part, we apply the results of Sect. IIII-CI to the 
situation where 77 is a parity-check matrix derived from 
polar coding and show that there is no rate penalty in this 
case. For brevity, we will refer to parity-check matrices ob¬ 
tained from polar coding as “polar parity-check” matrices. 
We first give a brief description of polar codes; for details, 
we refer to (4]. Let F= [} ° ] and G> = F® { denote the 7th 
Kronecker power of F. Note that G/ is an n x n matrix 
with n-2 c and its inverse is itself, Gp 1 = G(. Polar codes 
are defined in terms of the mapping x n = u n G( where x" 
denotes the codeword and u n denotes the source word. 
In polar coding we “freeze” a certain subset of coordinates 
of the source word u n and insert the data payload in the 
remaining portion of u n . To be specific, let & <= [1 : n\ 
denote the indices marking the frozen part of u n and let 
ujr = (Uj : i e YG) denote the frozen part. By convention, 
we set Kjr = 5 for some fixed pattern s e and keep 

this part unchanged from one transmission to next, while 
we leave the other part u&c free. The parity-check matrix 
for polar codes can be derived as follows. We begin with 
the definition that a word x n is a polar codeword iff 
x n = u n G( for some u n with tiy = s. Using the inverse 
relation u n - x n Gj 1 , we obtain that x" is a codeword iff 
s= (x^G^ 1 )^. Next, we observe that 

(x n GJ% = x n (GJ% 

where (G~ 1 ) denotes the submatrix of Gj l obtained by 
taking the columns with indices in YG. Thus, we obtain a 
parity-check matrix for polar codes, namely, 

( 20 ) 

Now, we consider Lemma[2]in connection with an ensemble 
(.S’, X' 1 , Y n ) based on a polar parity-check matrix. We annex 
to this ensemble the random vector U n = A" GJ 1 that 
corresponds to the source word in polar coding so that we 
have the relation 


S=(X n GY)# = U#. 















We wish to show that if S is chosen using the usual polar 
code design rules, then the rate penalty I{S',Y n ) will be 
negligible. The specific design rule we use here fixes a /5 < 
1/2 and selects the frozen set as 

& = jie [1: n\ : H{U i \Y n ,U i ~ 1 ) > 2 _ ”^|. (21) 

Now, by standard facts about the entropy function, we have 
KU^;Y n ) l = ) £ I(Ui-, Y n \U^ 

= E [Hmu^-Hmyn.u^)} 

< Y'ii-mu i \Y n .u i - 1 )] 

( b ) „ „p 

< \M\+ E 2~ n 

iej if 

(c) _„p 

< o(ri) + n2 = o[ri) 

where in (a) we defined = {j e S : j <i - 1}, in (b) split 
S' into 

Jt = {i e [1: n]: 2~ nf> < H(Ui\Y n U i ~ x ) < 1 - 2“" P }■ 

and 

^f = |/e [l:n]:H(t7dF"[/ i_1 )>l-2 _ " /i |, 

and in (c) used polarization results (5) to write the bound 
|^j = o(n). Thus, by Lemma [T] and Lemma [2] we conclude 
that the rate penalty I(S;Y n ) is o(n) and I{X;Y ) is achiev¬ 
able using the polar parity-check ensemble. 

The number of constraints imposed by polar parity- 
checks is \S\, which is nH(X\Y) + o(n) ||5]. The dimen¬ 
sionality of the ensemble X n is reduced from nH{X) + o{ri) 
to nI(X;Y) + o{n) by the polar parity-checks; this is the 
smallest possible dimensionality (to order 0{n )) for an 
ensemble that achieves I[X;Y). 

We refrained from calling the codes generated under 
polar parity-checks “polar codes” because there are major 
differences between the two classes of codes. To discuss 
this further, let us refer to the polar parity-check codes of 
this paper as PPC codes and reserve the term “polar code” 
for ordinary polar codes as defined in 0). The results of 
this paper establish that PPC codes achieve I{X;Y) with a 
probability of error that goes to zero exponentially in n, 
while for polar codes that exponent is not better than \fn 
even under ML decoding. The \fn exponent arises from 
the fact that the minimum distance of a code generated 
by a submatrix of G( cannot have a minimum distance 
better than O(y^T) for any fixed non-zero code rate. It 
must be that on average PPC codes have a minimum 
distance proportional to n; otherwise, their error exponent 
would not be proportional to n. This significant increase in 
minimum distance can be attributed to random selection 
of codewords; a PPC code may be seen as an expurgated 
polar code. The expurgation removes the defects in the 
polar code; but it also destroys the linear structure in the 
code. In standard polar coding, the mapping from messages 


to codewords is a linear relation of the form x n - u n G (, 
which can be implemented in complexity 0(nlog(n)). Un¬ 
der PPC coding, there is no linear relationship of this type 
between data bits and codewords; hence, one can no longer 
claim that the encoding complexity is 0(nlog(?i)). Thus, 
PPC codes show a gain in performance at the expense of 
giving up the low-complexity encoding properties of polar 
codes. Clearly, similar remarks apply to the complexity of 
decoding. 

For PPC codes, achieving I(X; Y) under an arbitrary 
target distribution q{x) is no different than achieving it 
under a uniform q{x). With polar codes, achieving l(X; Y) 
for a non-uniform q{x) is not a straightforward task; it 
requires extension of the standard method and employing 
common randomness between the encoder and decoder 
in order to shape the channel input distribution (g). With 
PPC codes, the shaping is built into the code selection 
procedure. 

V. Summary 

The main motivation for this work has been to develop a 
packing lemma for polar codes that would enable trans¬ 
lation of proofs by standard packing lemmas to similar 
results for polar coding. More work needs to be done to 
accomplish this broader goal. The main contribution of 
the paper has been the development of a technique for 
analyzing the performance of a random code ensemble 
defined by a fixed parity-check matrix. In this sense, the 
results may have relevance to a broader class of codes than 
polar codes. An interesting observation in the paper has 
been that the polar parity-check ensemble shows markedly 
better performance than the standard polar code of the 
same size. A better understanding of this phenomenon may 
be useful in designing better polar codes. 
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